SMT at the International Maritime Organization: experiences with combining in-house corpora with out-of-domain corpora
نویسندگان
چکیده
This paper presents a machine translation tool – based on Moses – developed for the International Maritime Organization (IMO) for the automatic translation of documents from Spanish, French, Russian and Arabic to/from English. The main challenge lies in the insufficient size of inhouse corpora (especially for Russian and Arabic). The United Nations (UN) granted IMO the right to use UN resources and we describe experiments and results we obtained with different translation model combination techniques. While BLEU results remain inconclusive for combinations, we also analyze user preferences for certain models (when choosing betweeen IMO only or combined with UN). The combined models are perceived by translators as being much better for general texts while IMO only models seem better for technical texts.
منابع مشابه
Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017
Japan Patent Information Organization (Japio) participates in patent subtasks (JPC-EJ/JE/CJ/KJ) with phrase-based statistical machine translation (SMT) and neural machine translation (NMT) systems which are trained with its own patent corpora in addition to the subtask corpora provided by organizers of WAT2017. In EJ and CJ subtasks, SMT and NMT systems whose sizes of training corpora are about...
متن کاملA Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles
Epistemic modality devices are believed to be one of the prominent characteristics of research articles as the commonly used genre among the academic community members. Considering the importance of such devices in producing and comprehending scientific discourse, this study aimed to cross–culturally and cross-linguistically investigate epistemic modality markers as an important subcategory...
متن کاملGenre Analysis of ELT and Nursing Academic Written Discourse through Introduction
Since Swales’ (1981, 1990) CARS model work on the move structure of research articles, studies on genre analysis have been carried out amongst which works on different parts of research articles in various disciplines has gained a considerable literature. This study aims to investigate the rhetorical structure of the Introduction sections of articles in two fields of English Language Teaching (...
متن کاملImproving Persian-English Statistical Machine Translation:Experiments in Domain Adaptation
This paper documents recent work carried out for PeEn-SMT, our Statistical Machine Translation system for translation between the English-Persian language pair. We give details of our previous SMT system, and present our current development of significantly larger corpora. We explain how recent tests using much larger corpora helped to evaluate problems in parallel corpus alignment, corpus cont...
متن کاملCombining Bilingual and Comparable Corpora for Low Resource Machine Translation
Statistical machine translation (SMT) performance suffers when models are trained on only small amounts of parallel data. The learned models typically have both low accuracy (incorrect translations and feature scores) and low coverage (high out-of-vocabulary rates). In this work, we use an additional data resource, comparable corpora, to improve both. Beginning with a small bitext and correspon...
متن کامل